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THE LOGARITHMICALLY AVERAGED CHOWLA 
AND ELLIOTT CONJECTURES FOR TWO-POINT 

CORRELATIONS 
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Abstract. Let A denote the Liouville function. The Chowla con¬ 
jecture, in the two-point correlation case, asserts that 



as a: —> 00, for any fixed natural numbers 01,02 and non-negative 
integer 61, &2 with 0162 ~ 02&1 A 0 . In this paper we establish the 
logarithmically averaged version 




xlu){x)<n-^x 


of the Chowla conjecture as x ^ 00, where 1 ^ u!(x) ^ x is an 
arbitrary function of x that goes to infinity as x ^ 00, thus break¬ 
ing the “parity barrier” for this problem. Our main tools are the 
multiplicativity of the Liouville function at small primes, a recent 
result of Matomaki, Radziwill, and the author on the averages of 
modulated multiplicative functions in short intervals, concentra¬ 
tion of measure inequalities, the Hardy-Littlewood circle method 
combined with a restriction theorem for the primes, and a novel 
“entropy decrement argument”. Most of these ingredients are also 
available (in principle, at least) for the higher order correlations, 
with the main missing ingredient being the need to control short 
sums of multiplicative functions modulated by local nilsequences. 

Our arguments also extend to more general bounded multiplica¬ 
tive functions than the Liouville function A, leading to a logarith¬ 
mically averaged version of the Elliott conjecture in the two-point 
case. In a subsequent paper we will use this version of the Elliott 
conjecture to affirmatively settle the Erdos discrepancy problem. 


1 . Introduction 


Let A denote the Liouville function, thus A is the completely multi¬ 
plicative function such that A(p) = —1 for all primes p. We have the 
following well known conjecture of Chowla [3]: 

Conjecture 1.1 (Chowla conjecture). Let k ^ 1, let ai,...,ak be 
natural numbers and let bi,... ,bk be distinct nonnegative integers such 
that Uibj — Ojbi 0 for 1 ^ i < j ^ k. Then 



n^x 


1 
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as X —> 00. 

Thus for instance the k = 2 case of the Chowla conjecture implies 
that 


A(n)A(n + 1) = o(x) 


( 1 . 1 ) 


as X — > 00. This can be compared with the twin prime conjecture, 
which is equivalent to the assertion that 


9{n)6{n + 2) —> oo 


( 1 . 2 ) 


as X ^ 00 , where 6{n) := logp when n is equal to a prime p, and 
6{n) := 0 otherwise. 

The k = 1 case of the Chowla conjecture is equivalent to the prime 
number theorem. The higher k cases are open, although there are a 
number of partial results available if one allows for some averaging in 
the bi,... ,bk parameters; see 1231, 0 for some recent results in this 
direction. The bound fll.ip is equivalent to the assertion that the pairs 
(A(n), A(n+1)) attain each of the four sign patterns (+1, +1) (+1, —1), 
(—1,+1), (-1,-1) (| + o(l))x times. In [12] it was shown that the 
(+1, +1) and (—1, —1) patterns occur at least + o(l))x times, and 
the (+1,-1) and (—1, +1) patterns occur » x log“^“^ x times for e > 0. 
In the recent paper IZD it was shown that in fact all four sign patterns 
occur » X times, so in particular 


A(n)A(n + 1) ^ (1 — S)x 


for some absolute constant A > 0 and sufficiently large x. An analogous 
claim for sign patterns (A(n), A(n + 1), A(n + 2)) of length three was 
shown in 1231, building upon the previous result in HI that showed 
that all sign patterns of length three occur inhnitely often. 

The hrst main result of this paper is to obtain a different averaged 
form of the Chowla conjecture in the hrst nontrivial case A: = 2, in 
which one averages in x rather than in bi,... ,bk. More precisely, we 
show 

Theorem 1.2 (Logarithmically averaged Chowla conjecture). Let ai, 02 
be natural numbers, and let bi, 62 be integers such that 0162 — 02^1 A 0. 
Let 1 ^ a;(x) ^ X he a quantity depending on x that goes to infinity as 
X —> 00. Then one has 


x/u)(x)<n^x 




as n 


00. 
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Thus for instance this theorem implies (after setting u{x) := x, Oi = 
02 = ^2 = 1 and bi = 0) that 

A(n)A(n + 1 ) 


S 

n^x 


n 


= o(loga;) 


( 1 . 4 ) 


as X ^ 00; this can be deduced from fll.ip by a routine summation 
by parts argument, but is a strictly weaker estimate. From this and 
the elementary estimate Xinscx = o(log x) we see that for any sign 
pattern (61,62) e { —the set {n : (A(?7.),A(n + 1 )) = (61,62)} 
occurs with logarithmic density 1 / 4 , that is to say 


1 

log X 


n^a;:(A(n),A(»i+l)) = (ei,e2) 


1 

n 


1 

4 


+ 0 ( 1 ) 


as x —> 00. 

More generally, one can deduce Theorem 11.21 from the k = 2 case 
of Conjecture 11.11 by summation by parts; we leave the details to the 
interested reader. Conversely, the k = 2 case of Conjecture II.Il ls equiv¬ 
alent to the limiting case of Theorem ll. 2 l in which ui is hxed rather than 
going to infinity. The logarithmic averaging is unfortunately needed in 
our method in order to obtain an approximate affine invariance in the 
n variable; we do not know how to modify our argument to remove 
this averaging. However, the logarithmic averaging can be tolerated 
in some applications (for instance to the Erdos discrepancy problem, 
discussed below). 

Estimates such as fll.ip . fll.2p . fll.3p . fll.4p are well known to be sub¬ 
ject to the parity problem obstruction (see e.g. m Chapter 16 ]), and 
thus cannot be resolved purely by existing sieve-theoretic (or circle 
method) techniques that rely solely on “linear” estimates for the Li- 
ouville function. We avoid the parity obstacle here by using a new 
“bilinear” estimat^ for the Liouville function, which relates to bounds 
such as fll.dp through the multiplicativity property X{pn) = —X{n) of 
the Liouville function at small primes p, and which is proved using the 
(weak) expansion properties of a certain random graph, closely related 
to one recently introduced in [ 2 l]. To describe this strategy in some¬ 
what informal terms, let us specialise to the case of establishing fll.4p 
for simplicity. Suppose for contradiction that the left-hand side of fll.4p 
was large and (say) positive. Using the multiplicativity X{pn) = —X{n), 
we conclude that 

y, A(n)A(?7, + p)lp|^ 
n 

n^x 

is also large and positive for all primes p that are not too large; note 
here how the logarithmic averaging allows us to leave the constraint 


1 


Bilinear estimates have been used to get around the parity obstacle in previous 


works, most notably in the Friedlander-Iwaniec result m on primes of the form 
+ 6L 
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n < X unchanged. Summing in p, we conclude that 


'^pe'P p)^p\n 

^ n 


is large and positive for any given set V of medium-sized primes. By a 
standard averaging argument, this implies that 



(1.5) 


is large for many choices of n, where H is a medium-sized param¬ 
eter at our disposal to choose, and we take V to be some set of 
primes that are somewhat smaller than H. To obtain the required 
contradiction, one thus wants to demonstrate signihcant cancellation 
in the expression fll.Sp . As in [21], we view n as a random variable, in 
which case fll.Sp is essentially a bilinear sum of the random sequence 
(A(n -t-1),..., A(n -t- H)) along a random graph Gn,H on {1,..., H}, in 
which two vertices j, j + p are connected if they differ by a prime p in 
V that divides n + j. A key difficulty in controlling this sum is that for 
randomly chosen n, the sequence (A(?7.-t-l),..., \{n + H)) and the graph 
Gn,H need not be independent. To get around this obstacle we intro¬ 
duce a new argument which we call the “entropy decrement argument” 
(in analogy with the “density increment argument” and “energy incre¬ 
ment argument” that appear in the literature surrounding Szemeredi’s 
theorem on arithmetic progressions (see e.g. [22]), and also reminis¬ 
cent of the “entropy compression argument” of Moser and Tardos |26]b 
This argument, which is a simple consequence of the Shannon entropy 
inequalities, can be viewed as a quantitative version of the standard 
subadditivity argument that establishes the existence of Kolmogorov- 
Sinai entropy in topological dynamical systems; it allows one to select 
a scale parameter H (in some suitable range [//_,//+]) for which the 
sequence (A(?7, -t- 1),..., A(n -I- H)) and the graph Gn,H exhibit some 
weak independence properties (or more precisely, the mutual informa¬ 
tion between the two random variables is small). With this additional 
property, one can use standard concentration of measure results such as 
the Hoeffding inequality [18] to approximate fll.bp by the significantly 
simpler expression 



This latter expression can then be controlled in turn by an application 
of the Hardy-Littlewood circle method and an estimate for short sums 
of a modulated Liouville function established recently by Matomaki, 
Radziwill and the author in [23], which is based in turn on the results 
of Matomaki and Radziwill in [21] . 
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The arguments in this paper extend to other bounded multiplicative 
functions than the Liouville function, though as they rely in an essen¬ 
tial fashion on multiplicativity at small primes, they unfortunately do 
not appear to have any bearing as yet on twin prime-type sums such 
as (na. More precisely, we have the following logarithmically aver¬ 
aged and nonasymptotic version of the Elliott conjecture [1] (in the 
“corrected” form introduced in [23]): 

Theorem 1.3 (Logarithmically averaged nonasymptotic Elliott con¬ 
jecture). Let 01,02 be natural numbers, and let 61,62 be integers such 
that 0162 — 0261 ¥= 0. Let e > 0, and suppose that A is sufficiently large 
depending on £,01,02,61,62. Let x u A, and let gi, g2. N ^ C 
be multiplicative functions with \gi{n)\, \g2{n)\ < 1 for all n, with gi 
“non-pretentious” in the sense that 


E 

p^x 


I - Regi{p)x{p)p 

p 


^ A 


( 1 . 6 ) 


for all Dirichlet characters x of period at most A, and all real numbers 
t with |t| < Ax. Then 


y giffiin + bi)g2{a2n + 62) 

x/uxn^x 


^ slog CO. 


( 1 . 7 ) 


Remark 1 . 4 . Our arguments are in principle effective, and would yield 
an explicit value of A as a function of e, Oi, 02 , 61,62 if one went through 
all the arguments carefully, however we did not do so here as we expect^ 
the bounds to be rather poor. 


Theorem 11.31 clearly implies the following asymptotic version; 


Corollary 1.5 (Logarithmically averaged Elliott conjecture). Letai, 02 
be natural numbers, and let 61, 62 be integers such that 0162 — 0261 7^ 0. 
Let gi,g2'- C be multiplicative functions bounded in magnitude by 
one, with gi “non-pretentious” in the sense that 


. y 1-Re^i(p)x(p)p ^ ^ 

\t\^Ax ^ p 


( 1 . 8 ) 


^For instance, a back of the envelope calculation suggests that the decay rate 
in the right-hand side of (ll. 4 |) provided by optimising all the parameters in the 
arguments in this paper is something like 0 { (iogio°^iogx)<^ ) some small absolute 
constant c > 0; similarly, the dependence of T on l/e provided by the arguments in 
this paper appears to be roughly triple-exponential in nature, at least in the model 
case where gi , 52 are completely multiplicative and take values on the unit circle. 
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as X —> 00 for all Dirichlet characters x all A ^ 1. Then for any 
1 ^ u{x) ^ X which goes to infinity as x —> oo, one has 



(1.9) 


x/uj{x)<n^x 


as X ^ cc. 

Remark 1.6. If one replaced the conclusion fll.91) with the stronger, 
non-logarithmically-averaged estimate 



( 1 . 10 ) 


(say with 6 i, 62 ^ 0 to avoid the linear forms ain -t- fei, a 2 n -t- 62 leaving 
the domain of gi,g 2 ) then this is the k = 2 version of the corrected 
Elliott conjecture introduced in [23]. The original Elliott conjecture in 
[I] replaced the condition fll.Sp with the weaker condition 



for all real numbers f e R, but it was shown in [23] that this hypothesis 
was insufficient to establish fll.lOp (and it is not difficult to adapt the 
counterexample to also show that fll.9p fails under this hypothesis). On 
the other hand, it was shown in [23] that the corrected Elliott conjecture 
held if one averaged in the bi,... ,bk parameters (rather than in the x 
parameter as is done here). 

Using Vinogradov-Korobov error term zero-free region for L-functions 
(see [25l §9.5]), it is not difficult to establish fll.Sp when g is the Liou- 
ville function; see [221 Lemma 2] for a closely related calculation. Thus 
Corollary 11.51 implies Theorem 11.21 Some condition of the form fll.Sp 
must be needed in order to derive the conclusion fll.9p . as one can see 
by considering examples such as gi{n) := x(n)n** and g2{n) := gi{n), 
where x is a Dirichlet character of bounded conductor, f is a real num¬ 
ber of size t = o(x), and w is set equal to (for instance) (x/|f|)^/^. 
More precise asymptotics of sums such as those in fll.91) in the “preten¬ 
tious” case when gi and (72 both behave like twisted Dirichlet characters 
n t-^ x(?7,)?7,** were computed in the recent preprint of Klurman [20]. 

Corollary 11.51 also implies the asymptotic 



as X ^ 00 when gi,g2 are multiplicative functions bounded by 1, and at 
least one of gi,g 2 is equal to the Mobius function fi. Thus for instance 
one has 
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The latter two estimates can be easily deduced from the prime num¬ 
ber theorem in arithmetic progressions, but the first estimate is new. 
Combining this with the computations in |2T1 §2] (using logarithmic 
density in place of asymptotic probability), we conclude 


Corollary 1.7 (Sign patterns of the Mobius function). Let 


c 



0.3226... 


and let (€ 1 , 62 ) e { — 1, 0,-t-1}^. Then the set {n : (p(n),/i(n -f 1)) = 

( 61 , 62 )} has logarithmic density 

• 1 — + c = 0.1067 ... when ( 61 , 62 ) = (0, 0); 

• I ~ ^ 0.1426 ... when ( 61 , 62 ) = (+1, 0), (-1, 0), (0, -fl), (0, -1) 

and 

. I = 0.0806 ... when ( 61 , 62 ) = (+ 1 , + 1 ), (+ 1 , - 1 ), (- 1 , + 1 ), (- 1 , - 1 ). 

Again, the first two cases here could already be treated using the 
prime number theorem in arithmetic progressions, but the last case is 
new. One can also use similar arguments to give an alternate proof 
of [211 Theorem 1.9] (that is to say, that all nine of the above sign 
patterns for the Mobius function occur with positive lower density); 
we leave the details to the interested reader. 

In a subsequent paper [30] , we will combine Theorem 11.31 with some 
arguments arising from the Polymaths project [27] to obtain an affir¬ 
mative answer to the Erdos discrepancy problem [2]: 


Theorem 1.8. Let /: N —> { —l,-l-l} he a function. Then 


sup 2 f{jd) 


= - 1 - 00 . 


1.1. Notation. We adopt the usual asymptotic notation of X « T, 
Y y> X, or X = 0(F) to denote the assertion that |7^| < CY for some 
constant C. If we need C to depend on an additional parameter we will 
denote this by subscripts, e.g. X = Oe(Y) denotes the bound |X| < 
C^Y for some depending on Y. Similarly, we use X = oa^oo(F) 
to denote the bound |X| < c{A)Y where c{A) depends only on A and 
goes to zero as A ^ 00. 

If E is a statement, we use l^ to denote the indicator, thus 1^; = 1 
when E is true and 1^; = 0 when E is false. 

Given a hnite set S, we use [S'! to denote its cardinality. 

For any real number a, we write e{a) := this quantity lies in 

the unit circle S'^:={ 2 :eC:| 2 :| = l}. By abuse of notation, we can 
also define e{a) when a lies in the additive unit circle M/Z. 

All sums and products will be over the natural numbers N = {1,2,... } 
unless otherwise specified, with the exception of sums and products 
over p which is always understood to be prime. 
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We use d\n to denote the assertion that d divides n, and n (d) to 
denote the residue class of n modulo d. We use (a, b) to denote the 
greatest common divisor of a and b. 

We will frequently use probabilistic notation such as the expectation 
EX of a random variable X or a probability P(£') of an event E] later 
we will also need the Shannon entropy ]HI(X) of a discrete random vari¬ 
able, as well as related quantities such as conditional entropy ]HI(X|Y) 
or mutual information I(X, Y), the definitions of which we review in 
Section [3l We will use boldface symbols such as X, Y or n to refer to 
random variables. 

1.2. Acknowledgments. The author is supported by NSF grant DMS- 
0649473 and by a Simons Investigator Award. The author also thanks 
Andrew Granville, Ben Green, Kaisa Matomaki, Maksym Radziwill, 
and Will Sawin for helpful discussions, corrections, and comments, and 
the anonymous referees for a careful reading of the paper and many 
useful suggestions and corrections. 


2. Preliminary reductions 

In this section we make a number of basic reductions, in particular 
reducing matters to a probabilistic problem involving a random graph, 
somewhat similar to one considered in Readers who are interested 
just in the case of the Liouville function (Theorem II.2p can skip the 
initial reductions and move directljH to Theorem 12.31 below. 

As mentioned in the introduction. Theorem 11.21 is a special case of 
Gorollary 11.51 which is in turn a corollary of Theorem 11.31 Thus it will 
suffice to establish Theorem 11.31 

We hrst reduce to the case when gi takes values on the unit circle 

Rb 


Proposition 2.1. In order to establish Theorem M.Si it sujfices to do 
so in the special case where \gi{n)\ = 1 for all n. 


Proof. Suppose that gi takes values in the unit disk. Then we may 
factorise gi = g[g'( where g'i,gi are multiplicative, with g[ := |g'i| taking 
values in [0,1] and g" taking values in the unit circle S'b 

Let Aq be a large quantity (depending on oi, 02 , 6 i, 621 e) to be chosen 
later; we assume that A is sufficiently large depending on oi, 02 , 61 , 62 , ^o- 

Suppose hrst that 



p^x 


9[{P) 

P 


^ dlo. 


^For the application to the Erdos discrepancy problem in [30], one only needs 
the special case when (72 = FT and gi is completely multiplicative and takes values 
in S^. In that case one can also move directly to Theorem 12.31 skipping the initial 
reductions. 
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By Mertens’ theorem and the largeness of Aq and x, this implies that 

yi ^ - 9i jp) ^ ^ 

4 " P "2 
p^y 

for every < y < x (say). Applying the Halasz inequality (see e.g. 
[52] or [121 Corollary 1]) we conclude that 

- y! 9 'i{n) « Aoexp(-Ao/2) 

y^y 

for all ^ y ^ X (assuming x ^ A and A is sufficiently large 

depending on Aq). From this and the nonnegativity and boundedness 
of g[{n) it is easy to see that 

\ -= oao-oo logw 

, n 

x/ij^n^x 


since x ^ oj ^ A and A is large compared to Aq, and Aq is large com¬ 
pared to oi, f)i. Since ( 7 i(ain-f foi)g' 2 (a 2 n-t- 62 ) is bounded in magnitude 
by g'i{aini + hi), the claim fll.7p now follows from the triangle inequality 
(taking Aq large enough). 

It remains to treat the case when 



p^x 


d'ljp) 

P 


< Aq. 


We now use the probabilistic method to model g[ by a multiplicative 
function of unit magnitude. Since g'i{p^) takes values in the convex 
hull of { — 1 , -f 1 } for every prime power , we can construct a random 
multiplicative function g( taking values in { — 1 , -Ll}, such that the val¬ 
ues g[{p^) at prime powers are jointly independent and have mean 
lEgi(p') = g'lip^)- By multiplicativity and joint independence, we thus 
have Eg((n) = g[{n) for arbitrary n. By linearity of expectation we 
have 



p^x 


s[{p) 

p 


< Aq. 


SO by Markov’s inequality we see with probability 1 — O(l/Ao) that 


P^X 


1 


s'lip) 


p 


<Al. 


Let us restrict to this event, and set gi := g[gi, thus gi is a random 
multiplicative function taking values in whose mean is < 71 . By the 
triangle inequality we have 


gi(p) = gi{p) + 0(1 - g[{p)) + 0(1 - g;(p)) 
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and hence by fll.bp and the triangle inequality again we have 


S 

p^x 


1-Regi{p)x{p)p 

p 


^ A/2 


for all Dirichlet characters y of period at most A and all t with |t| < Ax, 
if A is large enough. Using the hypothesis that Theorem ll. 31 holds when 
gi has unit magnitude, we conclude (again taking A large enough) that 


y gi(ain + ^ 1 )^ 2(0211 + ^ 2 ) 
, n 

xjuxn^x 


< 



( 2 . 1 ) 


with probability 1 — 0{1/Aq). In the exceptional event that this fails, 
we can still bound the left-hand side of fl2.1|] by 0(loga;). Taking 
expectations, we obtain fll.7p as desired (for Aq large enough). □ 


A similar argument allows one to also reduce to the case where 
\ 92 {n)\ = 1 for all n (indeed, the argument is slightly simpler as fll.bp 
is unaffected by changes in § 2 ). 

Next, we upgrade the functions 91,92 from being multiplicative to 
being completely multiplicative. 

Proposition 2.2. In order to establish Theorem M.tA it sujfices to do 
so in the special case where \ 9 i{n)\ = \ 92 {'n)\ = 1 for all n, and gi is 
completely multiplicative. 

Proof. By the previous reductions we may already assume that \gi{n) \ = 
\g 2 {'n)\ = 1 for all n. If gi is not completely multiplicative, we can intro¬ 
duce the completely multiplicative function gi with gi{p) = gi{p) for 
all p. Clearly, gi takes values in . From Mobius inversion (twisted 
by ^ 1 ) we can factor gi as a Dirichlet convolution gi = cp * h for a 
multiplicative function h with h{p) = 0 and \h{pf)\ < 2 for all j ^ 2; 
indeed we have h{pf) = g{jf) — g{p)g{p^~^) for all j ^ 1. The left-hand 
side of 01.711 can then be rewritten as 

Mf3zh£h2!i±M. 

d x/Ltj<n^x:d\ain-\-bi 

As in the previous proposition, we choose a quantity Aq that is suffi¬ 
ciently large depending on ai,a 2 ,bi,b 2 ,e, and assume A is sufficiently 
large depending on Aq, Oi, 02 , fei, ^ 2 , £• We consider hrst the contribu¬ 
tion to the above sum of a single value of d with d ^ Aq. We crudely 
bound \h{d)\ by (say) Aq. The constraint d\ain -I- bi constrains n to 
some set of residue classes modulo d; the number of such classes is triv¬ 
ially bounded by d and hence by Aq. Making an appropriate change 
of variables and using the hypothesis that Theorem 11.31 holds for com¬ 
pletely multiplicative gi (replacing £ by e/2J^, and assuming A large 
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enough), we thus have 


x/uj<n^x\d\ain-\-bi 


~9i{^)92{a2n + h) 
n 


< 


2-4? 


logo; 


for each d ^ Aq. Thus the total contribution of those d with d ^ Aq is 
at most I logo;. 

Now we turn to the contribution where d > Aq. Here, we can use 

the triangle inequality to bound I]x/a.<n^x:d|ain+fei by 

0 (^^), so the net contribution of this case is 0 (loga; 2 rf>^g 
However, from taking Euler products one sees that 


^ ^2/3 

a 


0 ( 1 ) 


(say), and thus 


2 hM _ 


d>A.Q 


d 


Taking Aq large enough, we obtain the claim. 


□ 


A similar argument allows one to also reduce to the case where g 2 
is completely multiplicative. As gi,g 2 are now multiplicative and take 
values in S^, we have 

gi{ain + hi)g 2 {a 2 n + 62 ) = ^(02)^(01)5^1(0102?! + 026 i) 5 ' 2 (ai 02 ?! + 0162 ) 

so by replacing 01 , 02 , 61,62 with 0102 , 0102 , 6102,6201 respectively, we 
may assume that 01 = 02 = o, 61 = 6 , and 62 = 6 + h for some natural 
number o, integer 6 , and nonzero integer h. 

Finally, we observe that we can strengthen the condition u ^ x 
slightly to a; < -r-^, since for 7 -^ < oj ^ x, the contribution of those 
n for which n < logx can be seen to be negligible. (Indeed, we could 
reduce to the case where oj grew slower than any hxed function of x 
going to inhnity, but the restriction u < will suffice for us, as it 
prevents the n parameter from being extremely small.) 

Putting all these reductions together, we see that Theorem 11.31 will 
be a consequence of the following theorem. 


Theorem 2.3 (Logarithmically averaged nonasymptotic Elliott con¬ 
jecture). Let a be a natural number, and let 6 , h be integers with 6 , A 0. 
Let e > 0, and suppose that A is sufficiently large depending on e, a, 6 , h. 
Let X ^ ^ a; ^ A, and let gi,g 2 '. ^ be completely multiplica¬ 

tive functions such that (flTD holds for all Dirichlet characters x of 
period at most A, and all real numbers t with |t| < Ax. Then 


gi{an-\-b)g 2 {an-\-bh) 
, n 

x/u<n^x 


^ elogCT. 
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Let a, b, h, e be as in the above theorenfl. Suppose for sake of contra¬ 
diction that Theorem 12.31 fails for this set of parameters. By shrinking 
e, we may assume that e is sufficiently small depending on a, b, h. Thus 
for instance any quantity of the form Oa,b,h{^) can be assumed to be 
much smaller than 1, any quantity of the form Oa,b,h{^‘^) can be as¬ 
sumed to be much smaller than e, and so forth. We will also need a 
number of large quantities, chosen in the following ordeiH: 

• We choose a natural number that is sufficiently large de¬ 
pending on a, b, h, e. 

• Then, we choose a natural number that is sufficiently large 
depending on iL_, a, b, h, e. 

• Finally, we choose a quantity A > 0 that is sufficiently large 
depending on H+, iL_, a, b, h, e. 

The quantity A is of course the one we will use in Theorem 12.31 The 
intermediate parameters H_, will be the lower and upper ranges for 
a certain medium-sized scale H e [H_,H^] which we will later select 
using a pigeonholing argument which we call the “entropy decrement 
argument”. 

We will implicitly take repeated advantage of the above relative size 
assumptions between the parameters A, H+, iL_, a, b, h, e in the sequel 
to simplify the estimates; in particular, we will repeatedly absorb lower 
order error terms into higher order error terms when the latter would 
dominate the former under the above assumptions. Thus for instance 
OH+,H^,a,b,h,e{^) ^ oa^oo(I) Can be simplified to just o^^oo(l) by the as¬ 
sumption that A is sufficiently large depending on all previous parame¬ 
ters, and oa^oo(I) + or'_^oo( 1) can similarly be simplihed to oh_^oo( 1)- 
The reader may wish to keep the hierarchy 

1 X 

a, b,h « - « « p « H « H+ « A ^ oj ^ ^ x 

e logx 


and also 

x ^ u ^ x/u) ^ logx ^ log A » 

in mind in the arguments that follow. 

As we are assuming that Theorem 12.31 fails for the indicated choice 
of parameters, there exist real numbers 


X ^ oj ^ A 


( 2 . 2 ) 


“^The reader may initially wish to restrict to the model case a = 1,6 = 0,h = 1 
(and also gi = 92 = A) in what follows to simplify the notation and arguments 
slightly. 

^For the purposes of optimising the quantitative bounds, it seems that one should 
take H- = exp(e“‘^i), H+ = exp(exp(exp(e“‘"2))), and A = exp(exp(exp(e“‘^3^^) 
for some large absolute constants Ci < C2 < C3, at least in the regime where a,b,h 
are bounded and e is small, and after adjusting some of the estimates below to fully 
optimise the bounds. 
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and completely multiplicative functions gi, g 2 . N ^ such that 


P^X 


1 - Regi{p)x{p)p 

P 


^ A 


(2.3) 


for all Dirichlet characters y of period at most A, and all real numbers 
t with |t| < Ax, but such that 


x/uxn^x 


gi{an + b)g 2 {an + b + h) 
n 


> e logo;. 


(2.4) 


To use the hypothesis fl2.3p . we apply the results in [23] to control 
short sums of gi modulated by Fourier characters. 


Proposition 2.4. Let the notation and assumptions be as above. For 
all < if < H+, one has 


sup Yj 

^ x/uj<n^x 


1 


H 

i=i 


« 


log log H 
logH 


log w. 


(2.5) 


In particular, one has 


sup 




a ,—^ Hn 

x/uj<n^x 


H 


i=i 


= o//_^oo(loga;). 


( 2 . 6 ) 


We remark that Proposition 12.41 is the only way in which we will take 
advantage of the hypothesis fl2.3l) . which may now be discarded in the 
arguments that follow. 


Proof. Let a e M. Applying m Lemma 2.2, Theorem 2.3] (with W : = 
log^ H), we see that 


- E 


H 


H 

L 

i=i 


gi{n + j)e{aj) 


log log H 
log H 


for all ^ ^ A < 2x; for the purposes of verifying the hypotheses in 
[23] , we note that X ^ ^ ^ ^ , and hence W = log® H will 

be much less than A or (logA)^/^^®. Averaging this estimate from X 
between x/2u and 2x, we obtain fl2.5p and hence fl2.6p . □ 


It will be convenient to interpret these estimates in probabilistic 
language (particularly when we start using the concept of Shannon 
entropy in the next section). We introduce a (discrete) random variable 
n in the interval {n e M : x/oj < n ^ x} by setting 


P(n = n) 


1/n 




nGN:x/LO<n^x 


1 

n 


whenever n lies in this interval. 
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From fl2.2p and our hypothesis u < x/logx, we see that 

y - = (1 + oa^oo(I)) logo;. 

^ n 

nGN:x / uj<n^x 


We conclude from 02.41) that 

|E 5 fi(an + 6 ) 5 ^ 2(011 + h + h)\ 
while from 02 . 6 p we conclude that 

H 


sup E 

a 


i=i 


= Oh^ 


» £ 


«(H) 


(2.7) 


( 2 . 8 ) 


uniformly for all ^ H ^ H+. 

The logarithmic averaging in the n variable gives an approximate 
affine invariance to these probabilities and expectations (cf. [2H Lemma 
2.3]), which is of fundamental importance to our approach: 


Lemma 2.5 (Approximate affine invariance). Let q be a natural num¬ 
ber bounded by H+, and let r be a fixed integer with |r| < H+. Then 
for any event T’(n) depending on n, one has 

P(P(n) and n = r (q)) = -P(F(gn + r)) + oa^oo(I). 

Q 

More generally, for any complex-valued random variable X(n) depend¬ 
ing on n and bounded in magnitude by 0 ( 1 ), one has 

E(X(n)ln=r (g)) = -E(X(qn + r)) + oa^oo(I)- 
q 

Note in particular that this lemma implies the approximate transla¬ 
tion invariance P(P(n + r)) = P(F(n)) + o^^oo(l) and E(A(n + r)) = 
E(A(n)) -I- o^^oo(l) for any r = 0(i/+). If we did not perform a log¬ 
arithmic averaging, then we would still have approximate translation 
invariance, but we would not necessarily have the more general approx¬ 
imate affine invariance, which causes the remainder of our arguments 
to break down. 


Proof. It suffices to prove the latter claim. The left-hand side can be 
written as 

l-hOyi^oo(l) v' 


log(x; 


L 

x/(jj<n^x:n=r (q) 


n 


Making the change of variables n = qn' -I- r, noting that ^ is equal to 
jniformly in n', we can write 

l-hOA^oo(l) v' flX{qn' + r) 


q n‘ 

as 


7 + OA^oo[-h) uniformly in n', we can write the previous expression 


logo; 


x/LJ<qn'-\-r^x 


n' 


+ Oa^co I 

' n' 


The net contribution of the oa^oo(^) term can be seen to be oa^oo(I) 
(recall that A is assumed large compared to and hence with q). 
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The constraint xju) < qn' + r ^ x can be replaced with x/u) < n' ^ x 
while incurring an error of O ( O (log q )) = oa^oo(I)- The claim 

follows. □ 


We now give a simple application of the above lemma. By Fourier 
expansion (or by positivity) we may insert the constraint l(j|n in the 
left-hand side of 02.81) (recalling that is assumed sufficiently large 
depending on a), and thus by Lemma [2.51 we also have 


sup E 

a. 


H 

Yj 9i{an + 3)e{aj) 

i=i 




(2.9) 


This estimate will be useful later in the argument. 
From Lemma 12.51 and 02.7p we have 


|Eln=b (a)^i(n)fif 2 (n + h)| » e. (2.10) 

Crucially, we can exploit the multiplicativity of gi,g 2 at medium-sized 
primes to average this lower bound by further application of Lemma 

1231 


Proposition 2.6. Assume that the bound O2.10p holds. Let < iL < 
H+. Let Vh denote the set of primes between yiL and e^H. For each 
prime p, let Cp e denote the coefficient Cp := ^(p)^(p). Then one 
has 

^p^ar\.+j=ph (ap)5fi(an -L ^> 2 ( 1 ^ + j + ph) 

peVn j:jq+phe[l,H] 

( 2 . 11 ) 

We remark that in the Liouville case gi = 92 = ^ (and also in the 
case 92 =W required in the Erdos discrepancy problem application in 
j3UjL we have Cp = 1 for all p. This leads to some minor simplihcation 
in the arguments (in particular, we only need to apply Proposition 
1231 for “major arc” values of a, allowing one to replace m Lemma 
2.2, Theorem 2.3] by the simpler |23l Theorem A.lj), however it turns 
out that existing results in the literature (in particular, the restriction 
theorem for the primes in [13] ) allow us to handle the extension to more 
general Cp without much additional difficulty. 

A key point here is that Proposition 12.61 applies for all scales H in 
the range This is because we will not be able to compute 

the left-hand side of 02.111) for any specified H; however, the “entropy 
decrement argument” we will use in the next section will locate (basi¬ 
cally thanks to the pigeonhole principle) a single scale H in the range 
for which the left-hand side of 02.111) can be evaluated, at 
which point we can apply the above proposition. The inability to spec¬ 
ify the scale H in advance is a key reason why we were unable to remove 
the logarithmic averaging from our hnal result in Theorem 11.31 


» £ 


H 


log if 
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Proof. Write 

X := Eln=b („)5fi(n)5f2(n + h), 

thus fl2.10p tells us that |X| » e. From complete multiplicativity and 
the dehnition of Cp we see that 

ln=6 (a)^l(n)^ 2 (n+ /l) = Cp iap)gi{pn)g2{pn + ph) 

and thus 

Ecplpn=pb (^ap)gi{p^)g2{pn + ph) = X (2.12) 

for any p e Vh- We now claim that 

Ecpln+J=pb (ap)^i(n + i)^ 2 (n + j + ph) = + oa^oo(I) (2.13) 

for any 1 ^ j ^ H and any p e Vh- To see this, we split ln+j=pb (ap) as 
ln=-j (p)ln+j=pb (a) and apply Lemma 12751 to write the left-hand side of 
110311 as 

^Ecplpn=pb (a)5'i(pn)5(2(pn + ph) + Oa^oo(I); 

since lpn=pb (a) = lpn=pb (ap), the claim now follows from fl2.12p . 

Summing 02.1311 over j = 1,H, we have 

H I 

Ecp L ^n~\-j^pb {ap)gi{n +j)g2{n. + j+ph) = -HX + OA^oDil)- (2.14) 
i=i ^ 

Now let us introduce the quantities 

H 

Q{s) := Ecp L (ap)fi'l(n + j) 5 ' 2 (n + j+ ph)ln=s (a) (2.15) 

i=i 

for s E Z/aZ. From 02.1411 we have 

V Q{s) = -HX + OA^ao{l). (2.16) 

sEZ/aZ ^ 

Now let us compare Q{s) with (5(s-f 1). Using Lemma 1X51 to replace 
n with n + 1, we see that 

H 

Q{s + 1) = Ecp L (ap)^l(n + 1 + i)^ 2 (n + 1 + j + p/l)ln+i=s+i (a) + Oa^oo(I) 

i=i 

H+l 

= Ecp L ^n~\-j^pb {ap)gi{n + j) 5 ' 2 (n + j+ ph)ln=s (a) + Oa^oo(I). 
i=2 

Note that the difference between Y^=2 ^n+j=pb (ap) 5 'i(n+j) 5 ' 2 (n+j+ph) 
and ln+j=pb (apis'!(n -t- j) 5 ' 2 (n + j + ph) is zero with probability 
1 — 0{l/p), and is 0(1) in the remaining event. Absorbing the o^^oo(l) 
error in the 0(l/p) error, we conclude that 

Q{s + 1) = Q{s) + 0{l/p) 
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for all s e Z/aZ. Thus Q fluctuates by at most 0{a/p), and in particular 


Q(o) 


1 

a 


2 Q{s) + 0{a/p). 

sG'Zlal^ 


Combining this with fl2.16p . we conclude that 

H 

Ecp 

^n~\-j^pb {ap) 9 l{n + 3)92(11 + j+ ph)ln=0 (a) 
i=i 

Summing over Vh, we conclude that 

H 

(ap)fi'l(n+i)fif2(n+i+p/l)ln=o (a) 

j^lpeVn 



» £ 


H 

aloghf 


and hence by the prime number theorem and the lower bound |X| » e, 
one has 

H 

^p^n-\-j^pb (ap)^l(n + j) 92 (n + j + ph)ln=0 (a) 

j^lpeVH 

Applying Lemma [2.51 we obtain 

H 

^p^aw+j^pb (ap)firi(an + j>2(an + j + ph)\ 

j=lpeVH 


» £ 


H 

\ogH’ 


li j+ph lies outside of the interval [1, H], then j lies in either [1, \h\e‘^H] 
or [(1 — |/i|£^)iL, H]. The contribution of these values of j can be easily 
estimated to be 0(2pg-p^ ^ ) = so from the smallness 

of £ we may discard these intervals and conclude the claim. □ 


We will shortly need to deploy the theory of Shannon entropy, at 
which point we encounter the inconvenient fact that 9 could potentially 
take an inhnite number of values and thus have unbounded Shannon 
entropy. To get around this, we perform a standard discretisation. 
Namely, dehne 9 i^e'^(n) for i = 1,2 to be 9 i(n) rounded to the nearest 
element of the lattice e^Z[i], where Z[z] denotes the Gaussian integers. 
(We break ties arbitrarily.) This function is no longer multiplicative, 
but it takes at most 0 ^( 1 ) values, it is bounded in magnitude by 0(1), 
and we have 9 i^s 2 = 9 i + 0(e‘^) for i = 1,2. Thus from the above 
proposition and the triangle inequality, we have 

E L 'p I! ^an+j=pb (ap) 5 ^ 1 , (0.11 + j)92,£^(^^ T J + ph) 

peVn j-.j,j+phs[l,H] 

since the error incurred by replacing gi with o', (.2 can be computed to 
be Oaie’^YApeVH f) = rewrite this inequality as 

|EF(X„,Y„)|»Pi^ 


» £ 


H 


\ogH 


(2.17) 
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where is the discrete random variable 

■= + j))i=l,2J = l,...,H 

(taking values in is the random variable 

:= n (Ph) 

(taking values in Z/P^Z) where Ph ■= YIpgVhP^ ^ 

Z/Ph'Z ^ C is the function 

1/ (-Pr)) •” ^ I Cp ^ I ^ay+j=pb {ap)^l,j^2,j+ph- 

peVn j-.j,j+phe[l,H] 

( 2 . 18 ) 

(Note that the residue class ay (ap) is well dehned for p e Z/Ph1^ and 
p e Vh, noting that Ph is coprime to a.) 

It is thus of interest to try to calculate the typical value of F(Xjp, Yh)- 
One can interpret F(Kh,Yh) as a “bilinear” expression of the com¬ 
ponents of Xjp along a certain random graph determined hy Yh- A 
key difficulty is that the random variables Xjp and Y h are not inde¬ 
pendent, and could potentially be coupled together in an adversarial 
fashion. In this worst case, this would require one to establish a suit¬ 
able “expander” property for the random graph associated ioYn that 
would ensure cancellation in the sum regardless of what values that Xjp 
will take. It may well be that such an expansion propert}@ holds (with 
high probability, of course). However, we can avoid having to establish 
such a strong expansion property by taking advantage of an “entropy 
decrement argument” to give some weak independence between Xjp 
and Yh for at least one choice of H between and H+. Once one 
obtains such a weak independence, it turns out that one only needs to 
show that for a typical choice of Xjp, that F(Kh, Yh) is small for most 
choices of Yh, where we allow a (nearly) exponentially small failure 
set for the Yh- This turns out to be much easier to establish than 
the expander graph property, being obtainable from standard concen¬ 
tration of measure inequalities (such as Hoeffding’s inequality), and an 
application of the Hardy-Littlewood circle method. 

Remark 2.7. The entropy decrement argument we give below can be 
viewed as a quantitative variant of the construction of the Kolmogorov- 
Sinai entropy of a topological dynamical system (see e.g. 0 ). but we 
will not explicitly use the language of topological dynamics here. See 
however pQ for a discussion of the Chowla conjecture and its relation to 
a conjecture of Sarnak [2H| from a topological dynamics point of view. 
It may well be that the arguments here could also beneht from a more 
explicit use of topological dynamics machinery. 

^Actually, to be able to plausibly expect expansion, one should enlarge Vh to 
be something like the primes between and e^H for some small 6 , so that the 
average degree of the random graph associated to Yh is significantly larger than 


one. 
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3. The entropy decrement argument 


We continue the proof of Theorem ll.3l We begin by briefly reviewing 
the basic Shannon inequalities from information theory. 

Recall that if X is a discrete random variable (taking at most count¬ 
ably many values), the Shannon entropy E[(X) is dehnecfl by the for¬ 
mula 


e(X) := 2 p(X = x)log 

X 


1 

P(X = x) 


where x takes values in the essential range of X (that is to say, those 
X for which P(X = x) is nonzero). A standard computation then gives 
the identity 


H(X, Y) = H(X|Y) + H(Y) = H(X) + e(Y|X) (3.1) 

for the joint entropy EI(X, Y) of the random variable (X, Y), where 
the conditional entropy ]H[(X|Y) is dehned by the formulae 

H(X|Y) ;= 2 P(Y = yMX\Y = y) (3.2) 

y 

(with y ranging over the essential range of Y) and 


e(X|Y = y)-=Y, P(X = x|Y = y) log 

X 


1 

P(X = xlY = y) 


with P(£'|F) := a F)/¥{F) being the conditional probability of E 
relative to F, and the sum is over the essential range of X conditioned 
toY = y. From the concavity of the function x ^ x log ^ and Jensen’s 
inequality we have 

e(x|Y) < e(x) (3.3) 

so we conclude the subadditivity of entropy 


e(X, Y) < H(X) + e(Y). (3.4) 


If we dehne the mutual information 


I(X, Y) := H(X)+H(Y)-e(X, Y) = H(X)-H(X|Y) = H(Y)-e(Y|X) 

(3.5) 

between two discrete random variables X, Y, we thus see that I(X, Y) = 
I(Y,X) ^ 0. 


Remark 3.1. One can view I(X,Y) as a measure of the extent to 
which the random variables X,Y are not independent. For instance, 
one can show that I(X,Y) = 0 if and only if X and Y are jointly 
independent. In a similar vein, one can view the conditional entropy 

Rn the information theory literature, the logarithm to base 2 is often used to 
define entropy, rather than the natural logarithm, in which case EI(X) can be in¬ 
terpreted as the number of bits needed to describe X on the average. One could 
use this choice of base in the arguments below if desired, but ultimately the choice 
of base is a normalisation which has no impact on the final bounds. 






20 


TERENCE TAG 


EI(X|Y) as a measure of the amount of new information carried by X, 
given that one already knows the value of Y. 

Conditioning the random variables X, Y to an auxiliary discrete ran¬ 
dom variable Z, we conclude the relative subadditivity of entropy 

e(x,Y|z) < e(x|z)+ H(Y|z). ( 3 . 6 ) 

Finally, a further application of Jensen’s inequality gives the bound 

e(X)<logiV (3.7) 

whenever X takes on at most N values. 

Recall the discrete random variables X^^, Y h defined previously. 
From fl3.7p . fl3.4p . and the fact that each component of X^ takes on 
only OeiX) values, we have the upper bound 

0 < MCXh) «e H. (3.8) 

Note that is within o^^oo(l) (in any reasonable metric) of being 
uniformly distributed on thus 

H(Y//) = log Ph — Oyi^oo(l)- (3.9) 

In particular, from the prime number theorem we have the crude bound 

m{YH) « H (3.10) 

for all ^ H+. 

Let us temporarily define the variant 

of Xh, where Hi,H 2 are natural numbers. From the approximate 
translation invariance provided by Lemma [2.51 we see that 

EI(XHi,Ri+_ff2) = H(X_h-2) + oa^oo(I) 

for any Hi,H 2 < 77+; applying 03.41) . and noting that X/^j +/^2 is the 
concatenation of and X.Hi,Hi+H 2 ^ we obtain the approximate sub¬ 
additivity property 

EI(Xh^+H2) ^ EI(X//J -I- ]HI(Xh2) + oa^oo(I) (3-11) 

for any natural numbers Hi, H 2 ^ 77+. 

We can improve this inequality if shares some mutual informa¬ 
tion with Yh, as Yh does not generate any entropy upon transla¬ 
tion. Indeed, from Lemma 12.51 again, we see for any natural numbers 
77, Hi, H 2 between 77_ and 77+ that 

EI(XHi,_ffi+R2|n + Hi {Ph)) = EI(XH2|n {Ph)) + oa^oo(I)- 

But n-l-77i {Ph) conveys exactly the same information as n {Ph) (they 
generate exactly the same finite a-algebra of events), so 

+ 77i {Ph)) = ^{^Hi,Hi+H2\^ {Ph))- 
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Inserting these identities into fl3.6p and recalling that Yh = n (Ph), 
we obtain the relative approximate snbadditivity property 

^ + e(X^,,|Y^^) + OA^ao(l) 

for any H, Hi, H 2 between H and //+. Iterating this, we conclude in 
particular that 

e(Xfc^,|Y^,) < fce(x^,|Y^,) + o^^oo(i) 

for any natural numbers k, H with ^ H ^ kH < (note that 
the number of iterations here is at most H+, so that the Oyi^oo(l) error 
stays under control). From this and fl3.1lh 03.511 we see that 

m{XkH) = H(XfcH|Y^,) + MiYn) - H(Y^,|Xfc^,) 
^m{XkH\YH)+m^H) 

< m(X^,|YH) + H(Y^^) + OA^oo(l) 

= m(x^,) - m(Xh, Yh) + e(YH) + OA^oo(i) 
which on dividing by kH and using 03.10p gives 

H(X»„),H(X„) I(X„,Y„) , 

kH ^ H H U/’ 

whenever H_ ^ H kH < H+ (note that we can absorb the Oyi^oo(l) 
error in the 0{l/k) term since k < H+). This can be compared with 
the inequality 

H(XfcH) ^ H(Xh) 
kH ^ H 

under the same hypotheses on H, k, coming from iterating 03.lip . Thus 
we see that the presence of mutual information between Xh and Yh 
causes a decrement in the entropy rate of Xh as one increases H. 

We can iterate this inequality and use an “entropy decrement ar¬ 
gument” to get a non-trivial upper bound on the mutual information 
I(Xh, Yh) for some large H\ 

Lemma 3.2 (Entropy decrement argument). There exists a natural 
number H between H_ and H^, which is a multiple of a, and such that 

II(Xh,Yh) ^ log if log log log i/' 

As we shall see later, the key point here is that this bound is not 
only better than the trivial bound of 0{H) coming from 03.1011 . but is 
(barely!) smaller than H/\ogH in the limit as H ^ oo] in particular, 
the mutual information between Xh and Yh is smaller than the num¬ 
ber I'PhI of primes one is using to dehne F(Xh, Yh). One may think of 
this lemma as providing a weak independence between Xh and Yh for 
certain large H. For the purposes of optimising the bounds, it appears 
to be slightly more efficient to prove a variant of this lemma in which 


+ Oa^oo(I) 
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the right-hand side is of the form (say); we leave the details to 

the interested reader. 

Proof. Suppose for sake of contradiction that one has 

> iQg ^ iQg iQg iQg ^ 

for all < H ^ Pi^ that are multiples of cl. Let be a sufficiently 
large natural number depending on H_, and let J be a sufficiently large 
natural number depending on Co,H_,e. We may assume that H+ is 
sufficiently large depending on Co, J. The idea is to now repeatedly 
use fl3.12p to decrement the entropy ratio as H increases, until 

one arrives at the absurd situation of a random variable with negative 
entropy. 

Let us recursively dehne the natural numbers PI- ^ Hi ^ H2 
• • • < Hj by setting Hi := aH- and 

Hj+i\= Hj [Co log Hj log log log Hj\ 

for all 1 ^ j < J. Note that if H^ is sufficiently large depending on 
iL_, Co, J, then all the Hj will lie between H_ and and are multiples 
of a. For Co large enough, we see from fl3.12p with i/, k replaced by Hj 
and [Co log iLj log log log iLjJ respectively, followed by fl3.7p . that 

_ 1 

Hj+i ^ Hj 2 log ip, log log log iLj 

for all 1 ^ j < J. (The Oyi^oo(l) error may be absorbed as we are 
assuming A to be large.) On the other hand, an easy inductiorU shows 
that there exists B ^ 10^° (depending on Co,H-) such that 

Hj ^ exp{Bj log j) 
for all 2 < J < J. Thus we have 

H(Xfl,„) ^ H(X„,) _ 1 _ 

Wj+, ' Hj 2Bjlogjloglog(Bjlogj) 

for all 2 < J ^ J, which on telescoping using fl3.8p gives the bound 


^^2Bj\ogj\og\og{Bj\ogj) 

But the sum on the left-hand side diverges (very slowly!) in the limit 
J —> 00 , and so we obtain a contradiction by choosing J (and then H+) 
large enough. □ 

®Alternatively, one can proceed by noting that for any given T ^ H -, there are 
» values of Hj between T and H if J is large enough, which is sufficient 

to get some divergence in Yi^i 2 log n, log log log n, as J ^ co. 




















CHOWLA AND ELLIOTT CONJECTURES 


23 


From the above lemma we can find an H between H_ and that 
is a multiple of a, such that 

I(X„,Y„) = o„_„ (i^)' 

Fix this value of H. From fl3.5p and fl3.13p we have 

= x) (H(Y^,) - e(YH|XH = x)) = OH_ 

X 

By 03.71) . 03.9p . the summands are bounded below by —oa^oo(I)- Thus, 
if we call a value x good if one has 

e(Y^) - H(Y^,|X^, = x) = OH^^oo (i^) ’ (3-14) 

we see from Markov’s inequality that the random variable will 
attain a good value with probability 1 — o//_^co(l)- 

Informally, if x is good, then Y h remains somewhat uniformly dis¬ 
tributed across even after one conditions Xji^ to equal x, in the 

sense that this conditioned random variable cannot concentrate too 
much mass into a small region. More precisely, we have 

Lemma 3.3 (Weak uniform distribution). Let x be a good value. Let 
Ex he a subset o/Z/PhI^ (which can depend on x) of cardinality 

Then one has 


(3.13) 


H 


log H 


P(Yh e Ex\IKh = x) = o_H'_^oo(l)- 

The quantity here could be replaced by any other function of e, 
but we use this particular choice to match with Lemma 13.51 below. 

Proof. Applying fl3.ip (conditioned to the event IKh = x) we have 

e(YH|x^, = x.IeA^h)) = h(y^,|Xh = x) + m{iEAYH)\YH,:^H = x 

-m{iEAYH)\XH = x) 

^ H(Y^,|Xh = x)- e(lE,.(YH)|X^, = x). 

By fl3.2p (again conditioned to the event X/^ = x), the left-hand side 
may be expanded as 

P(Y^ G Ex\Xh = a;)H(Y^,|X^, = x,Yh s E^) 

+ F{Yh f Ex\Xh = x)H(Y^,|Xh = x,YHf Ex) 
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and thus by 03.141) 

P(Yj, G E^\Kh = x)H(Y^,|X^, = x,Yh^ E^) 

+ p(Yh ^ = x)h(y^,|Xh = x,Yh^ e,) 

^ H(Y^) - e(ls^(YH)|X^, = x)- OH^^oo (i^) • 

By 03.7p . EI(l^^(Yj|/)|Xiif = x) is bounded by log 2 and so this term 
can be absorbed in the OH_^oo{H/\ogH) error. From 03. 3 p we have 

HPYhIXh = x,Yh^ E,) < EHYh) 

and hence 

P(Yh g F;,|X^, = x) (^(Yh) - H(Y^,|Xh = x,Yh^ E^)) ^ OH^^oo 
But from 03.7p one has 

H 

EI(YH|X^f = x,Yh ^ E^) < log \E^\ ^ logP^ - - — 

log H 

and the claim then follows from 03.9p (recalling that i7_ is large de¬ 
pending on e). □ 

Remark 3.4. Lemma IX^ mav also be derived from the data processing 
inequality 

Dki(1e.(Y^)||1e,(Y„)) < Dkl{Y'„\\Y„) 

where Y^ is the random variable Yh conditioned to the event X/^ = 
x, and where DklO^\\Y) := = x) log p[y=x) denotes the 

Kullback-Leibler divergence; we leave the details of this alternate deriva¬ 
tion to the interested reader. (Thanks to Yihong Wu for this observa¬ 
tion.) 

We can use this weak uniform distribution to show that ECKhjYh) 
concentrates as a function of Yj:^. We hrst observe 


Lemma 3.5 (Hoeffding inequality). Let x lie in the range o/IKh- Let 
Ex denote the set of all y g Z/Pj^Z such that 


E{x,y) 


jr L FM) 

^ V's'ZIPh'L 


2 H 

^ G -. 

logR 


Then 


Ex\ < exp 



H 

log FT 


Ph. 


Proof. We interpret this inequality probabilistically. Let y be drawn 
uniformly at random from Z/Ph'Z, then our task is to show that 


P |P(x,y) - EF(x,y)| ^ 


H 


logH 


< 


exp —£ 


H 


log H 
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We can write 


where 


F{x,y) = 2 Fp{x,y) 
peVn 


Fp{x,y) 






1 


ay-\-j=pb {ap)^lJ^2J-\-ph‘ 


j:j,j-\-phG[l,H] 


(3.15) 


Note that the only randomness in the quantity Fp{x,y) comes from 
the reduction y (p) of y modulo p. Since y is uniformly distributed 
in Z/Ph'Z, we see from the Chinese remainder theorem that the y {p) 
are uniformly distributed in Z/pZ and are jointly independent in p. As 
each Fp{x, y) is a deterministic function of y {p), we conclude that the 
Fp{x, y) are also jointly independent in p. On the other hand, since all 
p e Vh lie in the interval ^ p < we have the deterministic 
bound |Fp(a:,y)| ^ C/e^ for some absolute constant C. Applying the 
Hoeffding inequality [18], we conclude that 


P \F{x,y) - EF(a;,y)| ^ 


H 


log if 


« exp 


(2C'/£2)2|P^| j 


From the prime number theorem we have \Ph\ « the claim 

follows (as e is small and if is large). □ 


Combining this lemma with Lemma 13.31 we conclude that for any 
good X, one has 


P 


F(x,Y„)-— 2 nx.v) 




H 


logii 




By Fubini’s theorem, and the fact that Xj)/ is good with probability 
1 — 0^6 i-hus has 


F(Xh,Y^,) = ^ V F(Xh,?/) + 0 

ru ^ 


yeZ/Pn'^ 


H 


logii 


with probability 1 — ^^e other hand, from the triangle 

inequality, fl2.18p . and the prime number theorem we have 


F{x,y) « 


H 


log H 

We can thus take expectations and conclude that 
EF{Xh,Yh)=E^ Y, F{XH,y) + 0 


yeZ/Pn^ 


H 


log if; ’ 
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and hence by fl2.17p we have 



(3.16) 


The advantage here is that we have decoupled the x and y variables, 
and the y average is now easy to compute. Indeed, from the Chinese 
remainder theorem and flS.lSp we see that 



for any x and any p e V, and on summing in V and inserting into 
fl3.16p . we conclude that 



peVn ^ j-.j,j+phe[l,H] 

Since Qi = gi ^^2 + 0(£^) and gi,gi^s^ = 0(1) for z = 1,2, we we can 
replace gi^s 2 by gi on the left-hand side at the cost of an error of 


Ois^T.pe'PH f) = We thus have 



On the other hand, by using the Hardy-Littlewood circle method, 
we can obtain the following deterministic estimate for the expression 
inside the expectation. 

Lemma 3.6 (Circle method estimate). Let a,H be as above (in par¬ 
ticular, H is a multiple of a). For any a s M/Z, let Snict) denote the 
exponential sum 



(3.18) 


ps-P* P 

and let Eh denote the elements f e Z/iJZ for which 



for some g e Z/aZ. For j = 1,... ,H, let Xij, X 2 j be complex numbers 
bounded in magnitude by one. Then 



(3.19) 
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Proof. We extend Xij, X 2 j periodically with period H. If we remove the 
constraint that j + ph e we incur an error of pb^l) = 

0{\h\e^y^^) which is acceptable. Thus, viewing j now as an element 
of Z/HZ, we may replace the left-hand side of fl3.19p by 

Xj W Xj (a)XljX2,j+ph- (3.20) 

peVH ^ jeZ/HZ 

We perform a Fourier expansion 

H G.iOeUOH) 

^eZ/HZ 

for z = 1,2, where 

0‘K)'-V I! 

jeZjHZ 

We can thus expand fl3.20p as 

i; GM)G2{-e) H H la)e 

5,C'sZ/J7Z psVh ” jeZ/HZ 


(3^_ {j +Ph)^' \ 
\H H )■ 


The inner sum vanishes unless ^' = ^ + for some p e Z/aZ, in which 
case one has 


1 [ii U+Ph)i'\ H [ p{h + h)p phi\ 

L («)' (77- — J - A (- a - IT J 

■17. IPT^. \ \ / 


(recall that H was chosen to be a multiple of a), and thus by fl3.18p we 
can write fl3.20p as 


f 2 S 

peZ/aZ ^eZ/HZ ^ 



From the Cauchy-Schwarz inequality followed by the Plancherel iden¬ 
tity, one has 

H |GiK)l|G2(-^--r))|«l, 

^eZ/HZ 

SO those f f Zh give an acceptable contribution. For the remaining 
e, we bound G 2 {-f - fr?) crudely by 0(1) and - f) by 

^(logTf) triangle inequality to obtain the claim. □ 

Combining this lemma with fl3.17p . we conclude that 

H 


I! E 


1 

H 


By fl2.9p we thus have 


Y^gi{an + j)e{-j^/H) 
j=i 


^ «a,h 0 / 7 _^ oo (| 2 h |)- 


^ a,h 
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To conclude the desired contradiction, it thus suffices (by taking H_ 
large enough) to show 

Lemma 3.7 (Restriction theorem for the primes). We have | 2 h | «a,h,e 

1 . 


Proof. We invoke m Proposition 4.2] (with p = 4, F{n) := n, and N 
replaced by aH), which gives the bound 


heljaHTj 


H 


1/4 


Yj an/3Rin)e{-bn/aH)\‘^ j « 


n=l 



1/2 


for any sequence On, where R := (ahf)^/^° and is a certain non¬ 
negative weight constructed in m Proposition 3.1], whose only rele¬ 
vant properties here are that /3R{n) » logiP when n is a prime in Vh- 
Setting a„ set equal to when n is a prime in Vh, and a„ = 0 

otherwise, we conclude thato 


E 

k^'LjaH'L 




(3.21) 


and thus by Markov’s inequality we have \Sh{-^)\ ^ for at most 
Oe,a{^) values of /c e 'L/aH'L. The claim follows. □ 

Remark 3.8. In the special case gi = §2 = ^ (or more generally when 
g 2 is the complex conjugate of gi, we have Cp = 1, and the exponential 
sum Snioi) can then be handled by the Vinogradov estimates for ex¬ 
ponential sums over primes (see e.g. m § 13.5]). In that case, one can 
compute Eh fairly explicitly; it basically consists of those frequencies 
^ which are “major arc” in the sense that f/H is close to a rational 
a/q of bounded denominator q. As remarked previously, this allows for 
a slight simplihcation in the arguments in that the exponential sum 
estimates in [23l Lemma 2.2, Theorem 2.3] can be replaced with the 
simpler estimate in [23l Theorem A.l]; also, the quantitative bounds 
in Theorem 11.21 should improve if one uses this approach. However, 
for more general choices of gi,g 2 , the coefficients Cp are essentially ar¬ 
bitrary unit phases, and the frequency set Eh need not be contained 
within major arcs. 


®As an alternative proof of this estimate, one can use standard 
Fourier-analytic manipulations to rewrite the left-hand side of (13.211) as 

«gI]pi,p.,P 3 .P 4 ePH:pi+P 2 =P 3 +P 4 by the triangle inequality is 

bounded in magnitude by Oai^j]p^^p^^p^^p^^HH-.pi+P 2 =P 3 +PA 1)' sum may be 
upper bounded using a standard upper bound sieve for the primes (e.g. the Selberg 
sieve) to be Oe{H^/\og^ H), giving (13.211) . 
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4. Further remarks 


It is natural to ask if the arguments can be extended to higher point 
correlations than the k = 2 case, for instance to bound sums such as 
the three-point correlation 

A(n)A(n -I- l)A(n -1- 2) 

, n 

x/u)<n^x 

Most of the above arguments carry through to this case. However, 
the “bilinear” left-hand side of fl3.20p will be replaced by a “trilinear” 
expression such as 


(4.1) 


peVH 


P 


^ I XijX2j+pX3 


je 


J+2p 


These sorts of sums have been studied in the ergodic theory literature 
0. ra. Roughly speaking, the analysis there shows that these sums 
are small unless one has a large Fourier coefficient Gi(0 for some ^ s 
'L/H'L. However, in contrast to the previous argument in which ^ was 
restricted to a small set S// (which, crucially, was independent of n), 
one now has no control whatsoever on the location of As such, one 
would now need to control maximal averaged exponential sums such as 


1 

sup 

JX a 


1 

a 


X{n)e{an) 

x^n^x-\-H 


dx, 


(4.2) 


which (as pointed out in [23]) are not currently covered by the existing 
literature (note carefully that the supremum in a is inside the integral 
over x). However, this appears to be the only signihcant obstacle to 
extending the results of this paper to the k = 3 case, and so it would 
certainly be of interest to obtain non-trivial estimates on fl4.2p . Note 
however that if one replaces A(n) with n**, then the expression fl4.2p 
exhibits essentially no cancellation for t almost as large as X'^ (as op¬ 
posed to the condition t = 0{X) that naturally appears in the k = 2 
analysis). Similarly for the variant 

n^^{n + l)“^**(n -f 2)*^ 

x/uxn^x 



of fl4.ip . This suggests that in order to establish cancellation in fl4.ip 
and fl4.2p . one must somehow go beyond the techniques in [ 21 ], [ 23 ], as 
these techniques do not exclude the problematic multiplicative func¬ 
tions n I—> n** for t between x and x^. 

For even higher values of k, one has to now control quartilinear 
and higher expressions in place of 03.201) . Using the literature from 
higher order Fourier analysis (in particular the inverse theorem in [15], 
together with transference arguments from [9], [TT], or [33]), one is 
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now faced with the task of controlling sums even more complicated 
than 04.21) ■ in which the linear phases n i—> e(cm) are now replaced 
by more general nilsequences of higher step (which one then has to 
take the supremum over, before performing the integral); this task can 
be viewed as a local version of the machinery in 0, 0, and will be 
carried out in detail in [3T]. Of course, since satisfactory control on 
04.21) is not yet available (even if one inserts logarithmic averaging), 
it is not feasible at present to control higher step analogues of 04.21) 
either. However, one can hope that if a technique is found to give good 
bounds on 04.2p . it could also extend (in principle at least) to higher 
step sums. 

It is of course of interest to remove the logarithmic averaging from 
Theorem ll.2l or Theorem II.31 It appears difficult to do this while utilis¬ 
ing the entropy decrement argument, because this argument involves a 
scale H which cannot be specihed in advance, but is produced through 
a variant of the pigeonhole principle. However, it may be possible to 
estimate expressions such as 01.5p for a specihed H without resorting 
to the entropy decrement argument, by establishing some sort of ex¬ 
pander graph property for the random graph Gn^n (or some closely 
related graph) from the introduction, and then there would be some 
chance of removing the logarithmic averaging. Unfortunately we were 
unable to establish such an expansion property, as the edges in the 
graph Gn,H do not seem to be either random enough or structured 
enough for standard methods of establishing expansion to work. 
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